Practical

Geostatistics

In this practical we are going to fit a geostatisticsl model Encounter probability of Pacific Cod (Gadus macrocephalus) from a trawl survey.

In this practical we will:

  • Learn how to fit a spatial model in inlabru

  • Learn how to add spatial covariates to the model

  • Learn how to do predictions

  • Learn how to simulate from the fitted model


Libraries to load:

library(dplyr)
library(INLA)
library(inlabru) 
library(sf)
library(terra)

# load some libraries to generate nice map plots
library(scico)
library(ggplot2)
library(patchwork)
library(mapview)
library(tidyterra)

The data

In this practical, we will explore data on the Pacific Cod (Gadus macrocephalus) from a trawl survey in Queen Charlotte Sound. The pcod dataset is available from the sdmTMB package and contains the presence/absence records of the Pacific Cod during each surveys along with the biomass density of Pacific cod in the area swept (kg/Km\(^2\)). The qcs_grid data contain the depth values stored as \(2\times 2\) km grid for Queen Charlotte Sound.

The dataset contains presence/absence data from 2003 to 2017. In this practical we only consider year 2003.

We first load the dataset and select the year of interest

library(sdmTMB)

pcod_df = sdmTMB::pcod %>% filter(year==2003)
qcs_grid = sdmTMB::qcs_grid

Then, we create ab sf object and assigne the roght coordinate reference to it:

pcod_sf =   st_as_sf(pcod_df, coords = c("lon","lat"), crs = 4326)
pcod_sf = st_transform(pcod_sf,
                       crs = "+proj=utm +zone=9 +datum=WGS84 +no_defs +type=crs +units=km" )

We convert the covariate into a raster and assign the same coordinate reference:

depth_r <- rast(qcs_grid, type = "xyz")
crs(depth_r) <- crs(pcod_sf)

Finally we can plot our dataset. Note that to plot the raster we need to upload also the tidyterra library.

ggplot()+ 
  geom_spatraster(data=depth_r$depth)+
  geom_sf(data=pcod_sf,aes(color=factor(present))) +
    scale_color_manual(name="Occupancy status for the Pacific Cod",
                     values = c("black","orange"),
                     labels= c("Absence","Presence"))+
  scale_fill_scico(name = "Depth",
                   palette = "nuuk",
                   na.value = "transparent" ) + xlab("") + ylab("")

The model

We first fit a simple model where we consider the observation as Bernoulli and where the linear predictor contains only one intercept and the GR field defined through the SPDE approach. The model is defined as:

Stage 1 Model for the response \[ y(s)|\eta(s)\sim\text{Binom}(1, p(s)) \] Stage 2 Latent field model \[ \eta(s) = \text{logit}(p(s)) = \beta_0 + \omega(s) \] with \[ \omega(s)\sim \text{ GF with range } \rho\ \text{ and maginal variance }\ \sigma^2 \]

Stage 3 Hyperparameters

The hyperparameters of the model are \(\rho\) and \(\sigma\)

NOTE In this case the linear predictor \(\eta\) consists of two components!!

The workflow

When fitting a geostatistical model we need to fullfill the following tasks:

  1. Build the mesh

  2. Define the SPDE representation of the spatial GF. This includes defining the priors for the range and sd of the spatial GF

  3. Define the components of the linear predictot. This includes the spatial GF and all eventual covariates

  4. Define the observation model using the bru_obs() function

  5. Run the model using the bru() function

1. Building the mesh

The first task, when dealing with geostatistical models in inlabru is to build the mesh that covers the area of interest. For this purpose we use the function fm_messh_2d.

One way to build the mesh is to start from the locations where we have observations, these are contained in the dataset pcod_sf

mesh = fm_mesh_2d(loc = pcod_sf,           # Build the mesh
                  cutoff = 2,
                  max.edge = c(7,20),     # The largest allowed triangle edge length.
                  offset = c(5,50))       # The automatic extension distance
ggplot() + gg(mesh) + geom_sf(data= pcod_sf, aes(color = factor(present))) + xlab("") + ylab("")

Warning Task

Look at the documentation for the fm_mesh_2d function typing

?fm_mesh_2d

play around with the different options and create different meshes.

The rule of thumb is that your mesh should be:

  • fine enough to well represent the spatial variability of your process, but not too fine in order to avoid computation burden
  • the triangles should be regular, avoid long and thin triangles.
  • The mesh should contain a buffer around your area of interest (this is what is defined in the offset option) in order to avoid boundary artefact in the estimated variance.

2. Define the SPDE representation of the spatial GF

To define the SPDE representation of the spatial GF we use the function inla.spde2.pcmatern. This takes as input the mesh we have defined and the PC-priors definition for \(\rho\) and \(\sigma\) (the range and the marginal standard deviation of the field).

PC priors Gaussian Random field are defined in (Fuglstad et al. 2018). From a practical perspective for the range \(\rho\) you need to define two paramters \(\rho_0\) and \(p_{\rho}\) such that you believe it is reasonable that \[ P(\rho<\rho_0)=p_{\rho} \] while for the margianal variance \(\sigma\) you need to define two parameters \(\sigma_0\) and \(p_{\sigma}\) such that you believe it is reasonable that \[ P(\sigma<\sigma_0)=p_{\sigma} \]

You can use the following function to compute and plot the prior distributions for the range and sd of the Matern field.

dens_prior_range = function(rho_0, p_alpha)
{
  # compute the density of the PC prior for the
  # range rho of the Matern field
  # rho_0 and p_alpha are defined such that
  # P(rho<rho_0) = p_alpha
  rho = seq(0, rho_0*10, length.out =100)
  alpha1_tilde = -log(p_alpha) * rho_0
  dens_rho =  alpha1_tilde / rho^2 * exp(-alpha1_tilde / rho)
  return(data.frame(x = rho, y = dens_rho))
}

dens_prior_sd = function(sigma_0, p_sigma)
{
  # compute the density of the PC prior for the
  # sd sigma of the Matern field
  # sigma_0 and p_sigma are defined such that
  # P(sigma>sigma_0) = p_sigma
  sigma = seq(0, sigma_0*10, length.out =100)
  alpha2_tilde = -log(p_sigma)/sigma_0
  dens_sigma = alpha2_tilde* exp(-alpha2_tilde * sigma) 
  return(data.frame(x = sigma, y = dens_sigma))
}

Here are some alternatives for defining priors for our model

spde_model1 =  inla.spde2.pcmatern(mesh,
                                  prior.sigma = c(.1, 0.5),
                                  prior.range = c(30, 0.5))
spde_model2 =  inla.spde2.pcmatern(mesh,
                                  prior.sigma = c(10, 0.5),
                                  prior.range = c(1000, 0.5))
spde_model3 =  inla.spde2.pcmatern(mesh,
                                  prior.sigma = c(1, 0.5),
                                  prior.range = c(100, 0.5))

And here we plot the different priors for the range:

ggplot() + geom_line(data = dens_prior_range(30,.5), aes(x,y, color = "model1")) +
geom_line(data = dens_prior_range(1000,.5), aes(x,y, color = "model2")) +
geom_line(data = dens_prior_range(100,.5), aes(x,y, color = "model3")) 

and for the sd:

ggplot() + geom_line(data = dens_prior_range(1,.5), aes(x,y, color = "model1")) +
geom_line(data = dens_prior_range(10,.5), aes(x,y, color = "model2")) +
geom_line(data = dens_prior_range(.1,.5), aes(x,y, color = "model3")) 

Warning Task

Consider the pcod_sf, the spatial extension and type of the data…is some of the previous choices more resonable than other?

NOTE Remember that a prior should be reasonable..but the model should not totally depend on it.

3. Define the components of the linear predictor

We have now defined a mesh and a SPDE representation of the spatial GF. We now need to define the model components:

cmp = ~ Intercept(1) + space(geometry, model = spde_model3)

NOTE since the dataframe we use (pcod_sf) is an sf object the input in the space() component is the geometry of the dataset.

4. Define the observation model

Our data are Bernulli distributed so we can define the observation model as:

formula = present ~ Intercept  + space

lik = bru_obs(formula = formula, 
              data = pcod_sf, 
              family = "binomial")

5. Run the model

Finally we are ready to run the model

fit1 = bru(cmp,lik)

Extract results

Hyperparameters

Warning Task

What are the posterior for the range \(\rho\) and the standard deviation \(\sigma\)? Plot the posterior together with the prior for both parameters.

Posterior marginals for the hyperparameters can be extracted from the fitted model as:

Code
fit1$marginals.hyperpar$'name of the parameter'
Code
# Extract marginal for the range

ggplot() + 
  geom_line(data = fit1$marginals.hyperpar$`Range for space`, aes(x,y, color = "Posterior")) +
  geom_line(data = dens_prior_range(100,.5), aes(x,y, color = "Prior"))


ggplot() + 
  geom_line(data = fit1$marginals.hyperpar$`Stdev for space`, aes(x,y, color = "Posterior")) +
  geom_line(data = dens_prior_sd(1,.5), aes(x,y, color = "Prior"))

Spatial prediction

We now want to extract the estimated posterior mean and sd of spatial GF. To do this we first need to define a grid of points where we want to predict. We do this using the function fm_pixel() which creates a regular grid of points covering the mesh

pxl = fm_pixels(mesh)

then compute the prediction for both the spatial GF and the linear predictor (spatial GF + intercept)

preds = predict(fit1, pxl, ~data.frame(spatial = space,
                                      total = Intercept + space))

Finally, we can plot the maps

ggplot() + geom_sf(data = preds$spatial,aes(color = mean)) + scale_color_scico() + ggtitle("Posterior mean")

ggplot() + geom_sf(data = preds$spatial,aes(color = sd)) + scale_color_scico() + ggtitle("Posterior sd")

Note The posterior sd is lowest at the observation points. Note how the posterior sd is inflated around the border, this is the “border effect” due to the SPDE representation.

Instead of predicting over a grid covering the whole mesh, we can limit our predictions to the points where the covariate is defined. We can do this by defining a sf object using coordinates in the object depth_r.

pxl1 = data.frame(crds(depth_r), 
                  as.data.frame(depth_r$depth)) %>% 
       filter(!is.na(depth)) %>%
st_as_sf(coords = c("x","y"))
Warning Task

Produce prediction over pxl1 unsing the same techniques as before. Plot your results.

Add hint details here…

Code
pred_pxl1 = predict(fit1, pxl1, ~data.frame(spatial = space,
                                      total = Intercept + space))

ggplot() + geom_sf(data = pred_pxl1$total,aes(color = mean)) + scale_color_scico() + ggtitle("Posterior mean")

Code
ggplot() + geom_sf(data = pred_pxl1$total,aes(color = sd)) + scale_color_scico() + ggtitle("Posterior sd")

Instead of computing the posterior mean and standard deviations of the estimated surface, one can also simulate possible realizations of such surface. This will give the user a better idea of the type of realized surfaces one can expect. We can do this using the function generate().

# we simulate 4 samples from the 
gens = generate(fit1, pxl1, ~ (Intercept + space),
                n.samples = 4)

pp = cbind(pxl1, gens)

pp %>% select(-depth) %>%
  pivot_longer(-geometry) %>%
    ggplot() + 
      geom_sf(aes(color = value)) +
      facet_wrap(.~name) +
        scale_color_scico(direction = -1) +
        ggtitle("Sample from the fitted model")

An alternative model

We now want to check if the depth covatiate has an influende on the probability of presence. We do this in two different models

  1. Model 1 The depth enters the model in a linear way. The linear predictor is then defined as:

\[ \eta(s) = \text{logit}(p(s)) = \beta_0 + \omega(s) + \beta_1\ \text{depth}(s) \]

  1. Model 1 The depth enters the model in a non linear way. The linear predictor is then defined as:

\[ \eta(s) = \text{logit}(p(s)) = \beta_0 + \omega(s) + f(\text{depth}(s)) \] where \(f(.)\) is a smooth function. We will use a RW2 model for this.

Warning Task

Fit model 1. Define components, observation model and use the bru() function to estimate the parameters.

Note Use the scaled version of the covariate stored in depth_r$depth_scaled.

What is the liner effect of depth on the logit probability?

Add hint details here…

Code
cmp = ~ Intercept(1) + space(geometry, model = spde_model3) +
        covariate(depth_r$depth_scaled, model = "linear")

formula = present ~ Intercept  + space + covariate

lik = bru_obs(formula = formula, 
              data = pcod_sf, 
              family = "binomial")


fit2 = bru(cmp, lik)

We now want to fit Model 2 where we allow the effect of depth to be non-linear. To use the RW2 model we need to group the values of depth into distinct classe. To do this we use the function inla.group() which, by default, creates 20 groups. The we can fit the model as usual

# create the grouped variable
depth_r$depth_group = inla.group(values(depth_r$depth_scaled))

# run the model
cmp = ~ Intercept(1) + space(geometry, model = spde_model3) +
        covariate(depth_r$depth_group, model = "rw2")

formula = present ~ Intercept  + space + covariate

lik = bru_obs(formula = formula, 
              data = pcod_sf, 
              family = "binomial")


fit3 = bru(cmp, lik)

# plot the estimated effect of depth

fit3$summary.random$covariate %>% ggplot() + geom_line(aes(ID,mean)) + 
                                  geom_ribbon(aes(ID, ymin = `0.025quant`, 
                                                      ymax = `0.975quant`), alpha = 0.5)

Warning Task

Create a map of predicted probability from Model 3. You can use a inverse logit function defined as

inv_logit = function(x) (1+exp(-x))^(-1)

The predict() function can take as input also functions of elements of the components you want to consider

Code
inv_logit = function(x) (1+exp(-x))^(-1)

pred3  = predict(fit3, pxl1, ~inv_logit(Intercept + space + covariate) )

pred3 %>% ggplot() + 
      geom_sf(aes(color = mean)) +
        scale_color_scico(direction = -1) +
        ggtitle("Sample from the fitted model")